Standard Memory Hierarchy Does Not Fit Simultaneous Multithreading
نویسنده
چکیده
Simultaneous multithreading (SMT) is a promizing approach in maximizing performance by enhancing processor utilization. We investigate issues involving the behavior of the memory hierarchy with SMT. First, we show that ignoring L2 cache contention leads to strongly overestimate the performance one can expect and may lead to incorrect conclusions. We then explore the impact of various memory hierarchy parameters. We show that the number of supported threads has to be setup according to the cache size, that the L1 caches have to be associative and small blocks have to be used. Then, the hardware constraints put on the design of memory hierarchies should limit the interest of SMT to a few threads.
منابع مشابه
Memory Subsystem Design for Multithreaded Processors
Multithreading processors pose new challenges and new opportunities for cache/memory hierarchy design. Multithreading significantly alters the data reference stream seen by the memory subsystem. Multithreading also demands very different performance characteristics from the cache hierarchy than a typical (uniprocessor) CPU. This paper is specifically concerned with memory hierarchy design consi...
متن کاملMemory Hierarchy Studies of Multimedia-enhanced Simultaneous Multithreaded Processors for MPEG-2 Video Decompression
This paper explores cache models for a simultaneous multithreaded processor with multimedia enhancements. We start with a wide-issue superscalar processor, enhance it by the simultaneous multithreading (SMT) technique, by multimedia units, and by an additional on-chip RAM storage. Our workload is a multithreaded MPEG-2 video decompression algorithm that extensively uses multimedia units. Variou...
متن کاملSupporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
Existing multiprocessor synchronization mechanisms are relatively heavyweight, due in part to the level of the memory hierarchy (typically main memory) at which threads must synchronize. Multithreaded processors, on the other hand, have the potential to significantly reduce synchronization cost, because threads share the processor simultaneously and can synchronize using processor-internal stat...
متن کاملIncreasing data reuse of sparse algebra codes on simultaneous multithreading architectures
In this paper the problem of the locality of sparse algebra codes on simultaneous multithreading architectures is studied. In this kind of architectures many hardware structures are dynamically shared among the running threads. This puts a lot of stress on the memory hierarchy, and a poor locality, both inter-thread and intra-thread, may become a major bottleneck in the performance of a code. T...
متن کاملEfficient Sampling Startup for Uniprocessor and Simultaneous Multithreading Simulation
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months. Statistical sampling and techniques like SimPoint that pick small sets of execution samples have been shown to provide accurate results while significantly reducing simulation time. The inefficiencies in sampling are (a) needing t...
متن کامل